Module minder_utils.Examples.Getting Started
Getting Started
To use the package, please:
-
access RDS to download the necessary files including mapping.json, validated_date.py and random_id_to_research_id.json (optional, for TIHM data).
-
acess the research portal to get the access token
-
Optional, download the TIHM data from research portal.
-
use the
Get Started.ipynb
to set the paths to the files and set the token. -
Check the examples in
Instruction.ipynb
.
Introduction
Currently, the package provides the functions:
- download: download and refresh the data
- download weekly data: download and refresh the data weekly
- loading the raw data: the data (both tihm and dri) will be re-format to a standard format
- loading the weekly data: the data (both tihm and dri) will be processed in weekly format and provided engineering features.
- pre-processing data: the data (both tihm and dri) will be processed (aggregated hourly/daily) into array, which is ready for machine learning models (note normalisation techniques need to be specified).
Here is an overview of this package,
-
Downloader
Intro:
Downloader
can download all types of data and refresh the downloaded data.Usage:
from download.download import Downloader Downloader().export(since='2021-10-10', until='2021-10-12', reload=True, save_path='./data/activity/', categories=['raw_activity_pir'])
-
Formatting
Intro:
Formatting
will reformat the data to a dataframe contains['id', 'time', 'location', 'value']
. Furthermore, it can automatically change the TIHM data same as DRI and concatenate them (optional).Usage:
from minder_utils.formatting import Formatting formater = Formatting() print(formater.activity_data)
-
Feature_engineer
Intro:
Feature_engineer
will reformat the data to a dataframe contains['id', 'time', 'location', 'value', 'week']
, where week is the week index of a date, e.g. the week for 2020.01.01 is 2001 (first week of the year 20). Furthermore, it can statistically analysis the weekly data. Please check the documentation for details.Usage:
from minder_utils.feature_engineering import Feature_engineer fe = Feature_engineer(Formatting()) print(fe.activity)
Configurations
In the ./configurations
, there are two editable files:
- ```config_dri.yaml```: configurations for the data
- ```config_engineering_feature.yaml```: configurations for the feature engineering
Here's some important attributes you man changed according to your need. In the table below,
dri is the config_dri.yaml
, fe is the config_engineering_feature.yaml
.
config name | which file | description |
---|---|---|
save_path | both | the path to save the data so you won't need to process it next time. |
save_name | both | the name to save the data. NOTE: RECOMMEND NOT TO CHANGE. If you want to save the data in a different name, please edit the ```save``` attribute under corresponding item. |
verbose | both | print the message in console |
refresh | both | use the processed data or re-process the data again. NOTE: if you use ```weekly_loader``` to refresh the data every week, please set it as True |
add_tihm | dri | concatenate the TIHM data to the DRI data. |
nocturia | fe | The time range to calculate the activities of bathroom during the night. The time outside this range will be used to calculate the bathroom activity during the day. |
activity | fe | What attributes will be returned to calculate the weekly activities (used for training the models) |
The other attributes basically can explain themselves.
Queries
If you have any problems while using it, please reach us on slack or create an issue on git. Thanks!
Expand source code
'''
<h1>Getting Started</h1>
To use the package, please:
1. access RDS to download the necessary files including mapping.json, validated_date.py
and random_id_to_research_id.json (optional, for TIHM data).
2. acess the research portal to get the access token
3. Optional, download the TIHM data from research portal.
4. use the ```Get Started.ipynb``` to set the paths to the files and set the token.
5. Check the examples in ```Instruction.ipynb```.
<h1>Introduction</h1>
Currently, the package provides the functions:
<ul>
<li>download: download and refresh the data</li>
<li>download weekly data: download and refresh the data weekly</li>
<li>loading the raw data: the data (both tihm and dri) will be re-format to a standard format</li>
<li>loading the weekly data: the data (both tihm and dri) will be processed in weekly format
and provided engineering features. </li>
<li>pre-processing data: the data (both tihm and dri) will be processed (aggregated hourly/daily) into array,
which is ready for machine learning models (note normalisation techniques need to be specified).
</li>
</ul>
Here is an overview of this package,
1. ```Downloader```
<p><b>Intro</b>:
```Downloader``` can download all types of data and
refresh the downloaded data.<p>
<p><b>Usage</b>:
```
from download.download import Downloader
Downloader().export(since='2021-10-10', until='2021-10-12',
reload=True, save_path='./data/activity/', categories=['raw_activity_pir'])
```
2. ```Formatting```
<p><b>Intro</b>:
```Formatting``` will reformat the data to a dataframe contains
```['id', 'time', 'location', 'value']```. Furthermore, it can automatically
change the TIHM data same as DRI and concatenate them (optional).</p>
<p><b>Usage</b>:
```
from minder_utils.formatting import Formatting
formater = Formatting()
print(formater.activity_data)
```</p>
3. ```Feature_engineer```
<p><b>Intro</b>:
```Feature_engineer``` will reformat the data to a dataframe contains
```['id', 'time', 'location', 'value', 'week']```, where week is the week index
of a date, e.g. the week for 2020.01.01 is 2001 (first week of the year 20).
Furthermore, it can statistically analysis the weekly data. Please check the
documentation for details.</p>
<p><b>Usage</b>:
```
from minder_utils.feature_engineering import Feature_engineer
fe = Feature_engineer(Formatting())
print(fe.activity)
```</p>
<h1>Configurations</h1>
In the ```./configurations```, there are two editable files:
<ul>
<li>```config_dri.yaml```: configurations for the data</li>
<li>```config_engineering_feature.yaml```: configurations for the feature engineering</li>
</ul>
Here's some important attributes you man changed according to your need. In the table below,
dri is the ```config_dri.yaml```, fe is the ```config_engineering_feature.yaml```.
<table>
<colgroup>
<col span="1" style="width: 15%;">
<col span="1" style="width: 15%;">
<col span="1" style="width: 70%;">
</colgroup>
<tbody>
<tr>
<th>config name</th>
<th>which file</th>
<th>description</th>
</tr>
<tr>
<td>save_path</td>
<td>both</td>
<td>the path to save the data so you won't need to process it next time.</td>
</tr>
<tr>
<td>save_name</td>
<td>both</td>
<td>the name to save the data. <b>NOTE</b>: RECOMMEND NOT TO CHANGE. If you want to
save the data in a different name, please edit the ```save``` attribute under
corresponding item.</td>
</tr>
<tr>
<td>verbose</td>
<td>both</td>
<td>print the message in console</td>
</tr>
<tr>
<td>refresh</td>
<td>both</td>
<td>use the processed data or re-process the data again. <b>NOTE</b>: if you use
```weekly_loader``` to refresh the data every week, please set it as True</td>
</tr>
<tr>
<td>add_tihm</td>
<td>dri</td>
<td>concatenate the TIHM data to the DRI data.</td>
</tr>
<tr>
<td>nocturia</td>
<td>fe</td>
<td>The time range to calculate the activities of bathroom during the night.
The time outside this range will be used to calculate the bathroom activity
during the day.</td>
</tr>
<tr>
<td>activity</td>
<td>fe</td>
<td>What attributes will be returned to calculate the weekly activities (used for
training the models)</td>
</tr>
</tbody>
</table>
The other attributes basically can explain themselves.
<h1>Queries</h1>
If you have any problems while using it, please reach us on slack or create an issue on git. Thanks!
'''